A AfterWork 


Project Brief: Innovating with GPT 
Models and Python 


Background Information 


You work as a Data Engineer for a fictional telecommunications company called "TeleComLink" 
operating in the countries of "Anglovia" (English-speaking) and "Francoland" (French-speaking). 
TeleComLink recently rolled out a new chatbot system on SMS and collected customer reviews 
regarding the implementation. These reviews, in the form of SMS messages, are stored ina 
dataset named "reviews.csv." Your task is to create a data pipeline that leverages GPT-3 models 
to translate the SMS reviews from French to English for better analysis and insights. 


Problem Statement and Guidelines 


You aim to build a data pipeline using Python to extract, transform, and load the dataset, 
performing language translation and sentiment analysis. Follow the instructions below to 
complete the pipeline: 


Extract Function 
e Use the Pandas library to load the "reviews.csv" dataset into a DataFrame. 
e Ensure the dataset has the following columns: reviews_id and review. 


Transform Function 

e Utilize GPT-3 models for language translation to convert French reviews to 
English. 

e Perform data cleaning techniques to ensure a clean dataset containing only 
English reviews. 

e Implement sentiment analysis on each review and store the sentiment result in a 
new column called sentiment. 

e Additionally, create a new column named urgency that indicates whether the 
review suggests an urgent issue based on the sentiment (positive, neutral, or 
negative). 


Load Function 
e Update the dataset to include the new columns: reviews_id, review, sentiment, 
and urgency. 


The data pipeline is successful if it produces a final dataset ("reviews.csv") with the updated 
columns: reviews_id, review, sentiment, and urgency. Additionally, the sentiment analysis should 
accurately reflect the sentiment expressed in each review. 


A AfterWork 


Upon completing the data pipeline, please provide recommendations on how TeleComLink can 
utilize the translated English reviews and the sentiment analysis to improve their chatbot system 
on SMS. Suggestions may include areas of improvement, customer support enhancements, or 
potential feature updates. 


Please post your completed data pipeline code on GitHub along with documentation explaining 
what the solution does and how to use it. 


Sample Dataset and Starter Code 


e Find a sample dataset to test your pipeline here. 
e Find the starter code here. For an extra level of difficulty, ignore the starter code and 
build your solution from scratch. 


Evaluation 


To assess your pipeline, the following criteria will be evaluated: 

Successful extraction of the dataset using pandas. 

Proper language translation from French to English using GPT-3 models. 
Implementation of relevant data cleaning techniques to ensure English reviews only. 
Accurate sentiment analysis and creation of the sentiment column. 

Creation of the urgency column based on the sentiment analysis. 

Correct loading of the final dataset with the updated columns. 

Proper organization of code, adherence to best practices, and code readability. 


Sample Solution 


Here’s a sample solution that shows how you can build your pipeline: link. 


